Automatic image colorization is a particularly challenging problem. Due to the high illness of the problem and multi-modal uncertainty, directly training a deep neural network usually leads to incorrect semantic colors and low color richness. Existing transformer-based methods can deliver better results but highly depend on hand-crafted dataset-level empirical distribution priors. In this work, we propose DDColor, a new end-to-end method with dual decoders, for image colorization. More specifically, we design a multi-scale image decoder and a transformer-based color decoder. The former manages to restore the spatial resolution of the image, while the latter establishes the correlation between semantic representations and color queries via cross-attention. The two decoders incorporate to learn semantic-aware color embedding by leveraging the multi-scale visual features. With the help of these two decoders, our method succeeds in producing semantically consistent and visually plausible colorization results without any additional priors. In addition, a simple but effective colorfulness loss is introduced to further improve the color richness of generated results. Our extensive experiments demonstrate that the proposed DDColor achieves significantly superior performance to existing state-of-the-art works both quantitatively and qualitatively. Codes will be made publicly available.
translated by 谷歌翻译
In this paper, we present a novel and effective framework, named 4K-NeRF, to pursue high fidelity view synthesis on the challenging scenarios of ultra high resolutions, building on the methodology of neural radiance fields (NeRF). The rendering procedure of NeRF-based methods typically relies on a pixel wise manner in which rays (or pixels) are treated independently on both training and inference phases, limiting its representational ability on describing subtle details especially when lifting to a extremely high resolution. We address the issue by better exploring ray correlation for enhancing high-frequency details benefiting from the use of geometry-aware local context. Particularly, we use the view-consistent encoder to model geometric information effectively in a lower resolution space and recover fine details through the view-consistent decoder, conditioned on ray features and depths estimated by the encoder. Joint training with patch-based sampling further facilitates our method incorporating the supervision from perception oriented regularization beyond pixel wise loss. Quantitative and qualitative comparisons with modern NeRF methods demonstrate that our method can significantly boost rendering quality for retaining high-frequency details, achieving the state-of-the-art visual quality on 4K ultra-high-resolution scenario. Code Available at \url{https://github.com/frozoul/4K-NeRF}
translated by 谷歌翻译
Approximating radiance fields with volumetric grids is one of promising directions for improving NeRF, represented by methods like Plenoxels and DVGO, which achieve super-fast training convergence and real-time rendering. However, these methods typically require a tremendous storage overhead, costing up to hundreds of megabytes of disk space and runtime memory for a single scene. We address this issue in this paper by introducing a simple yet effective framework, called vector quantized radiance fields (VQRF), for compressing these volume-grid-based radiance fields. We first present a robust and adaptive metric for estimating redundancy in grid models and performing voxel pruning by better exploring intermediate outputs of volumetric rendering. A trainable vector quantization is further proposed to improve the compactness of grid models. In combination with an efficient joint tuning strategy and post-processing, our method can achieve a compression ratio of 100$\times$ by reducing the overall model size to 1 MB with negligible loss on visual quality. Extensive experiments demonstrate that the proposed framework is capable of achieving unrivaled performance and well generalization across multiple methods with distinct volumetric structures, facilitating the wide use of volumetric radiance fields methods in real-world applications. Code Available at \url{https://github.com/AlgoHunt/VQRF}
translated by 谷歌翻译
We consider the estimation of average treatment effects in observational studies without the standard assumption of unconfoundedness. We propose a new framework of robust causal inference under the general observational study setting with the possible existence of unobserved confounders. Our approach is based on the method of distributionally robust optimization and proceeds in two steps. We first specify the maximal degree to which the distribution of unobserved potential outcomes may deviate from that of obsered outcomes. We then derive sharp bounds on the average treatment effects under this assumption. Our framework encompasses the popular marginal sensitivity model as a special case and can be extended to the difference-in-difference and regression discontinuity designs as well as instrumental variables. Through simulation and empirical studies, we demonstrate the applicability of the proposed methodology to real-world settings.
translated by 谷歌翻译
发达的ET(指数平滑或误差,趋势,季节性)方法在状态空间表示中纳入了指数平滑模型家族,已广泛用于自动预测。现有的ETS方法使用信息标准来选择模型选择,通过在适用于给定时间序列的所有模型中选择具有最小信息标准的最佳模型。当应用于大规模时间序列数据时,这种模型选择方案下的ETS方法会遭受计算复杂性。为了解决此问题,我们通过模拟数据上的培训分类器提出了一种有效的ETS模型选择方法,以预测给定时间序列的适当模型组件形式。我们提供了一项模拟研究,以显示模拟数据中提出的方法的模型选择能力。我们根据点预测和预测间隔,对广泛使用的预测竞争数据集M4评估我们的方法。为了证明我们方法的实际价值,我们在每月医院数据集上展示了方法的绩效改进。
translated by 谷歌翻译
深度学习有很多兴趣解决了在现实世界环境中应用神经网络模型的挑战。特别是,三个领域得到了相当大的关注:对抗性鲁棒性,参数稀疏性和输出稳定性。尽管有许多独立解决这些问题的尝试,但很少有效地解决了挑战。在本文中,我们通过提出组合解决这些问题的新型制定来解决构建整体深层学习模型的这个问题。关于表格和MNIST数据集的现实世界实验表明,我们的配方能够同时提高传统深度学习模型的准确性,鲁棒性,稳定性和稀疏性。
translated by 谷歌翻译
会话推荐系统(CRS)已成为一个新兴的研究主题,试图通过交互式对话进行建议,这些对话通常由发电和建议模块组成。 CRS的先前工作倾向于将更多的外部和领域特定知识纳入项目评论,以提高性能。尽管事实的收集和注释特定于外部领域的信息需要大量的人类努力并脱离了普遍性,但过多的额外知识在它们之间带来了更大的困难。因此,我们建议从上下文中充分发现和提取内部知识。我们将实体级别和上下文级别的表示形式捕获为对建议的共同模拟用户的偏好,在这种情况下,时间吸引的注意力旨在强调实体级表示中最近出现的项目。我们进一步使用预训练的巴特来初始化生成模块,以减轻数据稀缺性并增强上下文建模。除了在流行数据集(REDIAIL)上进行实验外,我们还包括一个多域数据集(OpenDialKg)来显示我们模型的有效性。两个数据集的实验都表明,我们的模型在大多数评估指标上都具有更好的性能,其外部知识较少,并且可以很好地推广到其他领域。对建议和生成任务的其他分析证明了我们在不同情况下模型的有效性。
translated by 谷歌翻译
在这项工作中,我们介绍了梯度暹罗网络(GSN)进行图像质量评估。所提出的方法熟练地捕获了全参考图像质量评估(IQA)任务中扭曲的图像和参考图像之间的梯度特征。我们利用中央微分卷积获得图像对中隐藏的语义特征和细节差异。此外,空间注意力指导网络专注于与图像细节相关的区域。对于网络提取的低级,中级和高级功能,我们创新设计了一种多级融合方法,以提高功能利用率的效率。除了常见的均方根错误监督外,我们还进一步考虑了批处理样本之间的相对距离,并成功地将KL差异丢失应用于图像质量评估任务。我们在几个公开可用的数据集上试验了提出的算法GSN,并证明了其出色的性能。我们的网络赢得了NTIRE 2022感知图像质量评估挑战赛1的第二名。
translated by 谷歌翻译
以自我为中心的视频为人类行为的高保真建模提供了细粒度的信息。手和互动对象是理解观众的行为和意图的一个关键方面。我们提供了一个标记的数据集,该数据集由11,243张以egentric的图像组成,并在各种日常活动中与手动和物体相互作用的每个像素分割标签。我们的数据集是第一个标记详细的手动触点边界的数据集。我们介绍了一种上下文感知的组成数据增强技术,以适应YouTube Eginbecentric视频的分布。我们表明,我们的强大手动分割模型和数据集可以作为基础工具,以提高或启用几个下游视觉应用程序,包括手状态分类,视频活动识别,3D网格对手相互作用的3D网格重建以及视频的视频介绍。 - 以自我为中心的视频中的对象前景。数据集和代码可在以下网址找到:https://github.com/owenzlz/egohos
translated by 谷歌翻译
最近,Deep Models已经建立了SOTA性能,用于低分辨率图像介绍,但它们缺乏与现代相机(如4K或更多相关的现代相机)以及大孔相关的分辨率的保真度。我们为4K及以上代表现代传感器的照片贡献了一个介绍的基准数据集。我们展示了一个新颖的框架,结合了深度学习和传统方法。我们使用现有的深入介质模型喇嘛合理地填充孔,建立三个由结构,分割,深度组成的指南图像,并应用多个引导的贴片amatch,以产生八个候选候选图像。接下来,我们通过一个新型的策划模块来喂食所有候选构图,该模块选择了8x8反对称成对偏好矩阵的列求和良好的介绍。我们框架的结果受到了8个强大基线的用户的压倒性优先,其定量指标的改进高达7.4,而不是最好的基线喇嘛,而我们的技术与4种不同的SOTA配对时,我们的技术都会改善每个座椅,以使我们的每个人都非常偏爱用户,而不是用户偏爱用户。强大的超级分子基线。
translated by 谷歌翻译